Skip to content

feat: v0.1.0 release polish - critical and major fixes#4

Merged
spignotti merged 6 commits intomainfrom
feat/v0.1.0-polish
Mar 23, 2026
Merged

feat: v0.1.0 release polish - critical and major fixes#4
spignotti merged 6 commits intomainfrom
feat/v0.1.0-polish

Conversation

@spignotti
Copy link
Copy Markdown
Owner

Summary

This PR implements all critical and major fixes from FEATURE.md to prepare litresearch for v0.1.0 publication.

Critical Fixes

  1. JSON parsing error handling - Wrapped json.loads() in both _screen_paper and _analyze_paper with try/except JSONDecodeError. Returns None and prints warning on malformed LLM responses instead of crashing.

  2. Semantic Scholar timeout/retry - Added s2_timeout config setting (default 10s). S2 client is now created with timeout=settings.s2_timeout, retry=False in both discovery and enrichment stages to prevent 14-minute hangs.

  3. PDF double-download prevention - PDFs are now saved during analysis stage to papers/ directory and marked pdf_downloaded=True. Export stage skips already-downloaded papers.

Major Fixes

  1. Immutable Settings construction - Refactored _build_settings() to use Settings(**overrides) pattern instead of post-init mutation.

  2. Output directory collision handling - Added --overwrite flag and auto-increment logic. When output directory exists and is populated, automatically uses output-2, output-3, etc.

  3. No-abstract paper handling - Papers without abstracts now get a ScreeningResult with relevance_score=0 and rationale="no abstract available" instead of being silently skipped.

  4. LLMError handling in query_gen - Wrapped call_llm() in try/except with clear error message on failure.

  5. Config file hygiene - Renamed litresearch.toml to litresearch.toml.example (already in .gitignore).

  6. HTML entity unescaping - Applied html.unescape() to title, venue, and abstract fields in Paper.from_s2().

  7. Stage-level test coverage - Added 3 new test files with comprehensive coverage:

    • test_stages_query_gen.py - query generation and error handling
    • test_stages_screening.py - screening behavior and no-abstract handling
    • test_stages_discovery.py - S2 client config and deduplication

Minor Polish

  1. Added BATCH_SIZE comment documenting S2 batch endpoint limit
  2. Added run summary block showing timing and counts at pipeline completion
  3. Changed screening_threshold default from 40 to 60 with documentation

Validation

All nox sessions pass:

  • ✅ lint (ruff check + format)
  • ✅ typecheck (pyright, 0 errors)
  • ✅ test (23 tests passed)

Commits

  • fix: critical issues - JSON parsing, S2 timeout, PDF deduplication
  • fix(cli): immutable settings construction and output collision handling
  • fix: handle no-abstract papers and LLMError in query_gen
  • fix: rename litresearch.toml to example and unescape HTML entities
  • test: add stage-level tests for query_gen, screening, discovery
  • chore: minor polish - comments, summary output, threshold default

- Guard json.loads() in analysis.py with try/except JSONDecodeError
- Add s2_timeout config setting (default 10s) with retry=False for S2 client
- Prevent PDF double-download by saving during analysis and marking pdf_downloaded
- Skip already-downloaded PDFs in export stage
- Refactor _build_settings to use immutable Settings(**overrides) pattern
- Add --overwrite flag to run command
- Auto-increment output directory name when directory exists and is populated
- Add tests for collision detection and overwrite behavior
- Write ScreeningResult with score=0 for papers without abstract
- Wrap call_llm in try/except LLMError in query_gen with clear error message
- Rename litresearch.toml to litresearch.toml.example (git mv)
- Add html.unescape() for title, abstract, venue in Paper.from_s2()
- Test query generation with successful LLM response and error handling
- Test screening behavior for no-abstract papers and JSON parse failures
- Test discovery S2 client configuration and paper deduplication
- Add comment for BATCH_SIZE in enrichment.py
- Add run summary block in pipeline.py with timing and counts
- Change screening_threshold default from 40 to 60 with documentation
@spignotti spignotti merged commit 7fd0ebd into main Mar 23, 2026
2 checks passed
@spignotti spignotti deleted the feat/v0.1.0-polish branch March 23, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant